Stacked Graphical Learning
نویسندگان
چکیده
In reality there are many relational datasets in which both features of instances and the relationships among the instances are recorded, such as hyperlinked web pages, scientific literature with citations, and social networks. Collective classification has been widely used to classify a group of related instances simultaneously. Recently there have been several studies on statistical relational learning for collective classification, including relational dependency networks, relational Markov networks , and Markov logic networks. In statistical relational learning models, collective classification is usually formulated as an inference problem over graphical models. Hence the existing collective classification methods are expensive due to the iterative inference procedure required for general graphical models. Procedures that learn collective classifiers are also expensive, especially if they are based on iterative optimization of an expensive iterative inference procedure. Our goal is to develop an efficient model for collective classification for relational datasets. In my thesis, I have studied a learning scheme called stacked graphical learning. In stacked graphical learning, a base learner is augmented by providing the predicted labels of relevant instances. That is, first, a base learner is applied to the training data to make predictions using a cross validation-like technique. Then we expand the features by adding the predictions of relevant examples into the feature vector. Finally the base learner is applied to the expanded feature sets to make the final predictions. The intuition behind stacked graphical learning is that combining the predictions on the neighbors with local features can capture the dependencies among examples; hence we can rely on the base learner to classify the instances using the expanded feature set. We have applied stacked graphical learning to many real problems including collective classification, sequential partitioning, information extraction, and multi-task problems in an information extraction system. Stacked graphical learning has been demonstrated to achieve competitive performance to the state-of-art relational graphical models with much less inference time. In addition to exploring many applications of stacked graphical learning in real problems, we formally analyze an idealized version of the algorithm, which can be formulate as an inhomogeneous Gibbs sampling process with parameters learned in a greedy manner, provide proof of convergence of the idealized version of stacking, and discuss the conditions under which the algorithm of stacked graphical learning is nearly identical to the idealized stacked graphical learning. We also studied an online version of stacked graphical learning, which integrates a single-pass online learning algorithm Modified Balanced Winnow with stacked learning. Online stacked graphical learning can save training time and is capable to handle large streaming datasets with minimal memory overhead. We analyze the time and memory cost of online stacked graphical learning and applied it to several real problems.
منابع مشابه
Stacked Graphical Learning: Learning in Markov Random Fields using Very Short Inhomogeneous Markov Chains
We described stacked graphical learning, a meta-learning scheme in which a base learner is augmented by expanding one instance’s features with predictions on other related instances. The stacked graphical learning is efficient, especially during inference, capable of capturing dependencies easily, and can be constructed based on any kind of base learner. In experiments on two classification pro...
متن کاملStacked Graphical Learning: Learning in Markov Random Fields using Very Short Inhomogeneous Markov Chains
We described stacked graphical learning, a meta-learning scheme in which a base learner is augmented by expanding one instance’s features with predictions on other related instances. The stacked graphical learning is efficient, especially during inference, capable of capturing dependencies easily, and can be constructed based on any kind of base learner. In experiments on two classification pro...
متن کاملStacked Graphical Models for Efficient Inference in Markov Random Fields
In collective classification, classes are predicted simultaneously for a group of related instances, rather than predicting a class for each instance separately. Collective classification has been widely used for classification on relational datasets. However, the inference procedure used in collective classification usually requires many iterations and thus is expensive. We propose stacked gra...
متن کاملThesis Proposal Stacked Graphical Learning
Traditional machine learning methods assume that instances are independent while in reality there are many relational datasets, such as hyperlinked web pages, scientific literatures with dependencies among citations, social networks, and more. Recent work on graphical models has demonstrated performance improvement on relational data. In my thesis I plan to study a meta-learning scheme called s...
متن کاملOnline Stacked Graphical Learning
Collective classification predicts class labels simultaneously for a group of related instances, rather than predicting a class for each instance separately. The existing collective classification methods are usually expensive due to the iterative inference in graphical models and their learning procedures based on iterative optimization. When the dataset is large, the cost of maintaining large...
متن کامل